An Approach to Feature Selection for Continuous Features of Objects

نویسندگان

  • Hongwei Wang
  • Guohe Li
  • Weijiang Wu
  • Xue Li
چکیده

A novel approach to feature selection is proposed for data space defined over continuous features. This approach can obtain a subset of features, such that the subset features can discriminate class labels of objects and the discriminant ability is prior or equivalent to that of the original features, so to effectively improve the learning performance and intelligibility of the classification model. According to the spatial distribution of objects and their classification labels, a data space is partitioned into subspaces, each with a clear edge and a single classification label. Then these labelled subspaces are projected to each continuous feature. The measurement of each feature is estimated for a subspace against all other subspace-projected features by means of statistical significance. Through the construction of a matrix of the measurements of the subspaces by all features, the subspace-projected features are ranked in a descending order based on the discriminant ability of each feature in the matrix. After evaluating a gain function of the discriminant ability defined by the best-so-far feature subset, the resulting feature subset can be incrementally determined. Our comprehensive experiments on the UCI Repository data sets have demonstrated that the approach of the subspace-based feature ranking and feature selection has greatly improved the effectiveness and efficiency of classifications on continuous features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

A New Framework for Distributed Multivariate Feature Selection

Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...

متن کامل

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...

متن کامل

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

Matching of Polygon Objects by Optimizing Geometric Criteria

Despite the semantic criteria, geometric criteria have different performances on polygon feature matching in different vector datasets. By using these criteria for measuring the similarity of two polygons in all matchings, the same results would not have been obtained. To achieve the best matching results, the determination of optimal geometric criteria for each dataset is considered necessary....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013